CPS222 Lecture: Maps; Binary Search Trees Last revised 1/25/2013 Objectives 1. To review the general concept of a map 2. To define binary search trees 3. To show how to perform operations on BST's Materials 1. Code for BST algorithms to project I. Introduction - Maps - ------------ - ---- A. One kind of data structure that shows up in many places is some form of search structure, or map. Conceptually, such a structure is a collection of key, value pairs that can be accessed by key. B. Such a structure typically supports operations for insertion, lookup, and removal of entries - though in somoe problems the contents of the map may be fixed so that only lookup is needed (in which case a different implementation may be used. These operations take the following form: 1. Insertion: __________________ Key, value | Map | ----------> | (key,value pairs)| |__________________| 2. Lookup: ___________ Key | Map | Value ----------> | | -------> |___________| 3. Deletion: ____________________________ | Map | ----------> | (key and its value removed)| |____________________________| C. There are actually quite a number of ways that such a structure can be implemented. 1. Pile (unordered array). Since all operations are O(n), suitable only for small sizes - where its simplicity may actually make it desirable 2. Ordered array Insertion and removal are O(n), but lookup is O(log n) since binary search can be used. Suitable only when contents are unchanging (e.g. table of keywords) - where its simplicity may make it desirable. 3. Linked list - unordered Insertion is O(1), but lookup is O(n). Deletion is O(n) to find the "victim" but then only O(1) to actually remove it. Suitable only in situations where insertion is the dominant operation (e.g. archival storage of information that is seldom actually referenced) 4. Binary search tree - we will discuss today 5. Hash table - we will discuss later in the week II. Binary Search Trees -- ------ ------ ----- A. One of the two main ways to implement a map of significant size where modification is needed is to use a binary search tree. B. Definition: a binary search tree is a binary tree in which each node contains a value (called a key) that is a member of a well-ordered set. Further, if p is a poiner to a node, then p -> _key >= every key in the node's left subtree, and and p -> _key <= every key in the node's right subtree. C. Observe: if one traverses a binary search tree in inorder, the nodes are visited in ascending order of the keys. Ex: DOG / \ BISON FOX / \ AARDVARK CAT is a binary search tree. Its inorder traversal is: AARDVARK BISON CAT DOG FOX II. Operations on Binary Search Trees -- ---------- -- ------ ------ ----- A. The utility of binary search trees comes from the fact that the operations of insert, lookup, and delete the node containing a certain key all take time proportional to the height of the tree. 1. If the tree is well balanced, then its height will be proportional to the logarithm of the number of nodes. a. Observe that, in a perfect binary tree, there are twice as many nodes at each level as there are at the preceeding level (since each node has two children.) Thus, the number of nodes in the tree grows as 2^height - which makes the height proportional to log number of nodes. (You will develop a more formal proof of this for a homework.) b. If keys are inserted into a binary tree in random order, the resultant tree will not be perfect, of course; but the height will still be proportional to log n. (This can be shown experimentally) 2. To see the utility of this, we can compare the average number of steps needed for various operations on various search structures, assuming that, in each case, the structure contains 1000 elements: structure insert delete lookup pile 1 500 500 ordered array 500 500 10 [binary search] (unordered) 1 500 500 linked list binary search 10 10 10 tree - if balanced B. Algorithms for binary search trees: 1. Finding a node containing a given key: PROJECT Code for lookup (recursive and non-recursive versions) Observe: This algorithm (in either form), requires a number of steps proportional to the height of the tree. 2. Inserting a new key - simplest form: PROJECT Code for insert (recursive and non-recursive versions) a. Observe: This algorithm (in either form), requires a number of steps proportional to the height of the tree. b. Observe that this insertion algorithm, while very simple, could lead to a highly non-optimal tree. Ex: Consider what happens if keys are presented in reverse order: FOX DOG CAT BISON AARDVARK But note that the same thing happens when they are presented in forward order! AARDVARK BISON CAT DOG FOX c. When we come to balanced binary search trees in a week or so, we will see that there are several relatively simple way to avoid such problems - leading to the ability to guarantee that the height of tree will never be more than a fixed (small) multiple of log n. 3. Deletion: This is a bit more complex than the other two operations. a. If the node we are removing has no children, it can be deleted and the pointer to it in its parent can be set to NULL. b. If the node has one child, that one child can become the child of the parent of the removed node (the grandparent adopts the grandchild) c. But if the node being removed has two children, life is more complex. Our basic goal is to guarantee that the resultant tree has the same inorder traversal as the original tree - minus the removed node. Observe: let the inorder traversal of the original tree be as follows, where D is the node being removed, P is its inorder predecessor, and S is its inorder successor: ... P D S ... what we want is a tree that traverses as follows: ... P S ... Observe: P is in D's left subtree, and S is in D's right subtree. Observe: P cannot have a right child, and S cannot have a left child. (D is the inorder successor of P, and is above P in the tree. If P had an rchild, P's inorder successor would lie in P's right subtree. A similar argument holds for S) Therfore: what we can do is arbitrarily choose either P or S, copy its data up to node D, and then remove P or S as the case may be. (Since P and S have a maximum of one child, removing either is less difficult.) PROJECT Code for remove (recursive implementation only) Observe: This algorithm requires a number of steps proportional to the height of the tree.